Reachability-Based Fault-Tolerant Routing1
نویسندگان
چکیده
Currently, clusters of PCs are being used as a costeffective alternative to large parallel computers. In most of them it is critical to keep the system running even in the presence of faults. As the number of nodes increases in these systems, the interconnection network grows accordingly. Along with the increase in components the probability of faults increases dramatically, and thus, fault-tolerance in the system, in general, and in the interconnection network, in particular, plays a key role. An interesting approach to provide fault-tolerance consists of migrating on fly the paths affected by the failure to new fault-free paths. In this paper, we propose a simple and effective faulttolerant routing methodology, referred to as Reachability Based Fault Tolerant Routing (RFTR), that can be applied to any topology. RFTR builds new alternative paths by joining subpaths extracted from the set of already computed paths, thus being time-efficient. In order to avoid deadlocks, RFTR performs, if required, a virtual channel transition on the subpath union. As an example of applicability, in this paper we apply RFTR to InfiniBand. Evaluation results on tori show that RFTR exhibits a low computation cost and does not degrade performance significantly.
منابع مشابه
Fault tolerant control with respect to actuator failures Application to steam generator process
This paper deals with the analysis of nonlinear reachability and fault tolerant properties of multiactuator nonlinear systems. In this case, the process is a steam generator process containing a set of actuators. After occurrence of one or several actuator faults detected and isolated by Fault Detection and Isolation (FDI) approaches, a quantitative analysis of the faulty system properties help...
متن کاملSynthesis Methodology for Task Based Reconfiguration of Modular Manipulator Systems
In this paper, we deal with two important issues in relation to modular reconfigurable manipulators, namely, the determination of the modular assembly configuration optimally suited to perform a specific task and the synthesis of fault tolerant systems. We present a numerical approach yielding an assembly configuration that satisfies four kinematic task requirements: reachability, joint limits,...
متن کاملTask Synchronization Process based on Petri Net
Task synchronization means that each redundant module has the same executing schedule in each task scheduling cycle of the operating system in the Triple Modular Redundancy (abbreviated TMR) fault-tolerant systems; it faces how to realize the coordination among the three modules. Therefore, it is necessary to investigate the task synchronization process of the TMR fault-tolerant system. In the ...
متن کاملDesign of modular fault tolerant manipulators
In this paper, we deal with two important issues in relation to modular recon gurable manipulators, namely, the determination of the modular assembly con guration optimally suited to perform a speci c task and the synthesis of fault tolerant systems. We present a numerical approach yielding an assembly con guration that satis es four kinematic task requirements: reachability, joint limits, obst...
متن کاملA Note on Fault Tolerant Reachability for Directed Graphs
In this note we describe an application of low-high orders [2] in fault-tolerant network design. Baswana et al. [1] study the following reachability problem. We are given a flow graph G = (V,A) with start vertex s, and a spanning tree T = (V,AT ) rooted at s. We call a set of arcs A ′ valid if the subgraph G = (V,AT ∪A ) of G has the same dominators as G. The goal is to find a valid set of mini...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007